Annotation of Clinical Narratives in Bulgarian language
نویسندگان
چکیده
In this paper we describe annotation process of clinical texts with morphosyntactic and semantic information. The corpus contains 1,300 discharge letters in Bulgarian language for patients with Endocrinology and Metabolic disorders. The annotated corpus will be used as a Gold standard for information extraction evaluation of test corpus of 6,200 discharge letters. The annotation is performed within Clark system — an XML Based System for Corpora Development. It provides mechanism for semi-automatic annotation. First a pipeline for Bulgarian morphosyntactic annotation and a cascaded regular grammar for semantic annotation are run, then rules for cleaning of frequent errors are applied. At the end the obtained result is manually checked. Our goal is to adapt the morphosyntactic tagger to the domain of clinical narratives as well.
منابع مشابه
Mining Association Rules from Clinical Narratives
We propose a method that processes raw informal medical texts (from health forums) and formal texts (outpatient records) in Bulgarian language in order to extract typical word co-occurrences in the form of association rules. When mining these rules we use some context information and small terminological lexicons to generalize the extracted frequent patterns. This allows to study informal expre...
متن کاملAnaphora - Clause Annotation and Alignment Tool
The paper presents Anaphora – an OS and language independent tool for clause annotation and alignment, developed at the Department of Computational Linguistics, Institute for Bulgarian Language, Bulgarian Academy of Sciences. The tool supports automated sentence splitting and alignment and modes for manual monolingual annotation and multilingual alignment of sentences and clauses. Anaphora has ...
متن کاملDesign of an extensive information representation scheme for clinical narratives
BACKGROUND Knowledge representation frameworks are essential to the understanding of complex biomedical processes, and to the analysis of biomedical texts that describe them. Combined with natural language processing (NLP), they have the potential to contribute to retrospective studies by unlocking important phenotyping information contained in the narrative content of electronic health records...
متن کاملBulgarian Language Resources for Ontology-Based Semantic Search
This paper presents the language resources, which would facilitate the ontology-based semantic search. Some of these resources are language independent, such as the domain ontology. Some depend on the specific language: terminological lexicons, annotation grammars, sense disambiguation rules, gold standard corpus. Here we focus on the Bulgarian resources constructed in two domains for supportin...
متن کاملBulgarian X-language Parallel Corpus
The paper presents the methodology and the outcome of the compilation and the processing of the Bulgarian X-language Parallel Corpus (Bul-X-Cor) which was integrated as part of the Bulgarian National Corpus (BulNC). We focus on building representative parallel corpora which include a diversity of domains and genres, reflect the relations between Bulgarian and other languages and are consistent ...
متن کامل